Skip to content

[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs#7166

Open
lonelygsh wants to merge 1 commit intoPaddlePaddle:developfrom
lonelygsh:fix-speculate-decoding-index-bugs
Open

[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs#7166
lonelygsh wants to merge 1 commit intoPaddlePaddle:developfrom
lonelygsh:fix-speculate-decoding-index-bugs

Conversation

@lonelygsh
Copy link
Copy Markdown
Contributor

@lonelygsh lonelygsh commented Apr 2, 2026

Motivation

本 PR 修复投机解码中 speculate_set_stop_value_multi_seqs 和 speculate_limit_thinking_content_length 两个 kernel 因 step_idx 语义变更引起的索引错误。

Modifications

speculate_set_stop_value_multi_seqs

修复 can_stop 判断:step_idx_now >= min_token_limit → step_idx_now + accept_num >= min_token_limit,因为 step_idx 不再包含本轮 token。
修复跳过条件:step_idx_now - accept_num + accept_idx + 1 < stop_seq_len → step_idx_now + accept_idx + 1 < stop_seq_len,去除旧语义遗留的 -accept_num 偏移。
修复 accept token 路由条件:stop_seq_len - 1 - i < accept_idx → stop_seq_len - 1 - i <= accept_idx,使 accept_idx 直接对应 stop sequence 结束的 accept token 位置,语义更清晰。
修复 accept_tokens 索引:去除多余的 -1 偏移。
修复 pre_ids_idx 计算:step_idx_now - accept_num + accept_idx - offset → step_idx_now + accept_idx - offset,去除旧语义遗留的 - accept_num 偏移。

speculate_limit_thinking_content_length

修复 current_base_step 计算:step_idx[bid] - original_accept_num + 1 → step_idx[bid] + 1,适配新 step_idx 语义。
去除 step_idx 回退逻辑:截断 accept_num 时不再修改 step_idx。
step_idx 参数改为 const:该 kernel 不再写入 step_idx,去除调用侧 const_cast。

测试

更新 test_speculate_set_stop_value_multi_seqs.py,同步适配新 step_idx 语义下的索引和匹配逻辑。

Usage or Command

无新增接口,修复已有逻辑。可通过投机解码推理验证 stop sequences 截断行为及 thinking 长度限制是否正确。

Accuracy Tests

单元测试通过。

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. 已更新 test_speculate_set_stop_value_multi_seqs.py
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 2, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 2, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 2, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


guanshihui] seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@lonelygsh lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from ba88df0 to 0f4325c Compare April 2, 2026 13:37
@lonelygsh lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 0f4325c to 41a8185 Compare April 2, 2026 13:40
@lonelygsh lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 41a8185 to 8dea198 Compare April 2, 2026 13:42
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@bb1f977). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7166   +/-   ##
==========================================
  Coverage           ?   73.62%           
==========================================
  Files              ?      383           
  Lines              ?    53513           
  Branches           ?     8378           
==========================================
  Hits               ?    39401           
  Misses             ?    11361           
  Partials           ?     2751           
Flag Coverage Δ
GPU 73.62% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lonelygsh lonelygsh changed the title [Speculative Decoding] fix mtp stop_seqs bugs [Speculative Decoding] fix mtp stop_seqs and limit thinging bugs Apr 3, 2026
@lonelygsh lonelygsh changed the title [Speculative Decoding] fix mtp stop_seqs and limit thinging bugs [Speculative Decoding] fix mtp stop_seqs and limit thinking bugs Apr 3, 2026
yuanlehome
yuanlehome previously approved these changes Apr 3, 2026
…_stop_value kernels

- speculate_limit_thinking_content_length: update current_base_step to
  step_idx+1 (step_idx now records history count before current round);
  remove incorrect step_idx decrement on accept_num truncation; mark
  step_idx param as const.
- speculate_set_stop_value_multi_seqs: fix can_stop gate to use
  step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx
  formula (remove stale -accept_num offset); use <= condition so accept_idx
  maps directly to the accepted token that ends the stop sequence; fix
  accept_tokens index (remove -1).
- Update unit tests for speculate_set_stop_value_multi_seqs kernel.
@lonelygsh lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from a0be6ee to 99b5c45 Compare April 8, 2026 08:15
Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-08

📋 Review 摘要

PR 概述:修复投机解码中两个 kernel 因 step_idx 语义变更引起的索引错误

变更范围custom_ops/gpu_ops/speculate_decoding/(2 个 CUDA kernel + 1 个测试文件)

影响面 Tag[Speculative Decoding] [BugFix]

PR 规范检查

PR 规范符合要求:

  • ✅ 标题包含 [Speculative Decoding] Tag
  • ✅ 描述包含 Motivation 和 Modifications
  • ✅ 提供了测试修改说明

问题

未发现阻塞性问题。

总体评价

本 PR 修复了因 step_idx 语义从"包含本轮 token"变更为"不包含本轮 token"后导致的索引计算错误。经过代码分析,两个 kernel 的修复逻辑正确:

  1. speculate_set_stop_value_multi_seqs.cu

    • can_stop 判断:step_idx_now >= min_token_limitstep_idx_now + accept_num >= min_token_limit
    • 跳过条件、accept token 路由、索引计算均正确去除了旧语义遗留的 -accept_num 偏移 ✓
    • 新增边界保护 accept_idx <= accept_num - 2 防止越界写入 eos ✓
  2. speculate_limit_thinking_content_length.cu

    • current_base_step 计算修复正确 ✓
    • 移除了 step_idx 回退逻辑,与只读语义一致 ✓
    • 参数改为 const int64_t* 语义正确 ✓
  3. 测试覆盖充分

    • 更新了 reference 实现与 CUDA kernel 逻辑一致
    • 新增 test_stop_seq_at_last_position_not_detected 验证边界行为
    • 所有断言符合新语义下的预期输出

经确认,其他使用 step_idx 的模块(speculate_verify.cuunified_update_model_status.cu 等)语义一致,无需同步修改。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants